Optimizing Multiple Queries against XML Streams
نویسنده
چکیده
Processing and querying streams, XML streams in particular, has recently become a widely recognized area of interest both in research and in industry. In contrast to traditional query evaluation for databases, where multiple queries against the same data can be evaluated sequentially, for a streamed environment only the simultaneous execution of multiple queries is feasible, as the sequential evaluation requires multiple passes over the stream. This work presents an overview of techniques for optimizing multiple queries posed against a stream of XML data. Building upon the SPEX query engine [79; 105], the problem how to find a cost-optimal query plan that allows the simultaneous evaluation of multiple queries against the same stream is presented and shown to be not only hard to solve but also hard to approximate, if arbitrary parts, and not only common prefixes as in previous approaches, can be shared among query plans. Several heuristics are investigated and compared, in particular with respect to their complexity. Furthermore, it is shown how to extend the SPEX query engine to support such query plans for multiple queries. This extension proves to be both natural and efficient. An extensive experimental evaluation shows that sharing arbitrary operators under a realistic cost function results in query plans that have consistently lower cost for reasonable sets of queries than query plans where only common prefixes are considered. In most cases, the relative improvement is higher than 50%. Although the time for generating such query plans is higher than for query plans where only common prefixes are shared, the increase in time is within an acceptable margin.
منابع مشابه
Optimizing XML data with view fragments
As web-based applications and data continue to grow, large caches of XML data will result in many application domains. In sensor web applications, there are continuous streams of sensor data being generated, converted to XML and stored for domain queries and data mining purposes. The main problem with these XML caches is that existing XML database queries are very slow, especially for large dat...
متن کاملOptimAX: optimizing distributed continuous queries
1 Setting Fulfilling the vision of a decentralized Web of peers requires efficient mechanisms for decentralized dissemination of information. RSS feeds are part of this vision: incremental updates to XML documents are pushed from a given producer to a set of subscribers along known paths. In this work, we envision processing continuous XML queries. Such queries are expressed in some XML query l...
متن کاملEfficient Evaluation of Multiple Queries on Streamed XML Fragments
With the prevalence of Web applications, expediting multiple queries over streaming XML has become a core challenge due to one-pass processing and limited resources. Recently proposed Hole-Filler model is low consuming for XML fragments transmission and evaluation, however existing work addressed the multiple query problem over XML tuple streams instead of XML fragment streams. By taking advant...
متن کاملProcessamento de consultas baseadas em palavras-chave sobre fluxos XML
XML streams have become a relevant research topic due to the widespread use of applications such as online news, RSS feeds, and dissemination systems. Such streams must be processed rapidly and without retention. Retaining streams could cause data loss due to the large data traffic in continuous processing. This context becomes more complex when thousands of queries must be evaluated simultaneo...
متن کاملApply Uncertainty in Document-Oriented Database (MongoDB) Using F-XML
As moving to big data world where data is increasing in unstructured way with high velocity, there is a need of data-store to store this bundle amount of data. Traditionally, relational databases are used which are now not compatible to handle this large amount of data, so it is needed to move on to non-relational data-stores. In the current study, we have proposed an extension of the Mongo...
متن کامل